Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Fix projection pushdowns in actor pool project #2680

Merged
merged 4 commits into from
Aug 24, 2024

Conversation

jaychia
Copy link
Contributor

@jaychia jaychia commented Aug 17, 2024

Adds some fixes for performing projection pushdowns performed by actor pool project

@jaychia jaychia force-pushed the jay/stateful-udf-logical-opt branch from 772c9a2 to 19dca2d Compare August 19, 2024 14:56
@jaychia jaychia force-pushed the jay/app-project-pushdowns branch 2 times, most recently from cfeb1bc to 2040828 Compare August 20, 2024 14:30
@jaychia jaychia changed the base branch from jay/stateful-udf-logical-opt to main August 20, 2024 14:30
@github-actions github-actions bot added the enhancement New feature or request label Aug 20, 2024
Copy link
Member

@kevinzwang kevinzwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Just two non-blocking comments.

I'm not the most familiar with how the actor pool project works, but I assume your optimizations, as explained, make sense, so based on that the code looks good to go.

Comment on lines 247 to 254
let required_column_names = projection
.projection
.iter()
.flat_map(get_required_columns)
.collect_vec();
let distinct_required_column_names =
required_column_names.iter().collect::<IndexSet<_>>().len();
if required_column_names.len() == distinct_required_column_names {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor comment but a possible way to do the same check that may be more performant is to iterate through the required columns and put them into a set after checking if they already exist in there. That way you can early terminate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, modified!

.arced();
let plan = LogicalPlan::Project(Project::try_new(
plan,
vec![col("udf_results_0"), col("udf_results_1")],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also add some aliased columns in here too, since they do not perform computation so you expect them to also be merged with the ActorPoolProject correctly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call-out, added it into the test and verified that the alias gets correctly absorbed into the ActorPoolProject

Copy link

codspeed-hq bot commented Aug 24, 2024

CodSpeed Performance Report

Merging #2680 will not alter performance

Comparing jay/app-project-pushdowns (3d77075) with main (bf5c853)

Summary

✅ 10 untouched benchmarks

@jaychia jaychia enabled auto-merge (squash) August 24, 2024 01:49
@jaychia jaychia merged commit 20ffed4 into main Aug 24, 2024
46 checks passed
@jaychia jaychia deleted the jay/app-project-pushdowns branch August 24, 2024 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants